{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "PageRank.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iWOZV94kYsbM",
        "colab_type": "text"
      },
      "source": [
        "在实际应用中许多数据都以图(graph)的形式存在，比如，互联网、社交网络都可以看作是一个图。图数据上的机器学习具有理论与应用上的重要意义。pageRank算法是图的链接分析 (link analysis)的代表性算法，属于图数据上的无监督学习方法。  \n",
        "\n",
        "pageRank算法最初作为互联网网页重要度的计算方法，1996年由page和Brin提出，并用于谷歌搜索引擎的网页排序。事实上，pageRank可以定义在任意有向图上，后来被应用到社会影响力分析、文本摘要等多个问题。  \n",
        "\n",
        "pageRank算法的基本想法是在有向图上定义一个随机游走模型，即一阶马尔可夫链，描述随机游走者沿着有向图随机访问各个结点的行为。在一定条件下，极限情况访问每个结点的概率收敛到平稳分布， 这时各个结点的平稳概率值就是其 pageRank值，表示结点的重要度。 pageRank是递归定义的，pageRank的计算可以通过迭代算法进行。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fAN4q0cqYn-f",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "#https://gist.github.com/diogojc/1338222/84d767a68da711a154778fb1d00e772d65322187\n",
        "\n",
        "import numpy as np\n",
        "from scipy.sparse import csc_matrix\n",
        "\n",
        "def pageRank(G, s = .85, maxerr = .0001):\n",
        "    \"\"\"\n",
        "    Computes the pagerank for each of the n states\n",
        "    Parameters\n",
        "    ----------\n",
        "    G: matrix representing state transitions\n",
        "       Gij is a binary value representing a transition from state i to j.\n",
        "    s: probability of following a transition. 1-s probability of teleporting\n",
        "       to another state.\n",
        "    maxerr: if the sum of pageranks between iterations is bellow this we will\n",
        "            have converged.\n",
        "    \"\"\"\n",
        "    n = G.shape[0]\n",
        "\n",
        "    # transform G into markov matrix A\n",
        "    A = csc_matrix(G,dtype=np.float)\n",
        "    rsums = np.array(A.sum(1))[:,0]\n",
        "    ri, ci = A.nonzero()\n",
        "    A.data /= rsums[ri]\n",
        "\n",
        "    # bool array of sink states\n",
        "    sink = rsums==0\n",
        "\n",
        "    # Compute pagerank r until we converge\n",
        "    ro, r = np.zeros(n), np.ones(n)\n",
        "    while np.sum(np.abs(r-ro)) > maxerr:\n",
        "        ro = r.copy()\n",
        "        # calculate each pagerank at a time\n",
        "        for i in range(0,n):\n",
        "            # inlinks of state i\n",
        "            Ai = np.array(A[:,i].todense())[:,0]\n",
        "            # account for sink states\n",
        "            Di = sink / float(n)\n",
        "            # account for teleportation to state i\n",
        "            Ei = np.ones(n) / float(n)\n",
        "\n",
        "            r[i] = ro.dot( Ai*s + Di*s + Ei*(1-s) )\n",
        "\n",
        "    # return normalized pagerank\n",
        "    return r/float(sum(r))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Ds-wQEFFZ1F7",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "outputId": "b2860902-8712-4583-ab47-bec602c6791b"
      },
      "source": [
        "# Example extracted from 'Introduction to Information Retrieval'\n",
        "G = np.array([[0,0,1,0,0,0,0],\n",
        "              [0,1,1,0,0,0,0],\n",
        "              [1,0,1,1,0,0,0],\n",
        "              [0,0,0,1,1,0,0],\n",
        "              [0,0,0,0,0,0,1],\n",
        "              [0,0,0,0,0,1,1],\n",
        "              [0,0,0,1,1,0,1]])\n",
        "print(pageRank(G,s=.86))"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "[0.12727557 0.03616954 0.12221594 0.22608452 0.28934412 0.03616954\n",
            " 0.16274076]\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}